knitr::opts_chunk$set(error = TRUE)

Packages

We will need ‘vegan’ package to test our function.

library(vegan)
## Loading required package: permute
## Loading required package: lattice
## This is vegan 2.5-2

Write your own function to measure pairwise distance/dissimilarity

Write a function that computes your favorite measures of dis/similarity (at least 2 should be included). Construct function that measures a pairwise similarity/dissimilarity for two vectors (i.e., function(x,y)) Try to build in some warnings and error messages so you (and your friends) are warned when something goes wrong.

Now create some test data to make sure that your function does what you want it to do.

Do your warnings and error messages work?

# Does your function tell users when they tried to use data with missing values?
sample1 <- c(0, 3, NA)
sample2 <- c(9, 4, 12)
mydist(sample1,sample2)
## Error in mydist(sample1, sample2): NO! NO!! NO!!! missing values are not allowed
# Does it warn users who tried to use two vectors of different length? 
sample1 <- c(0, 3, 4, 56)
sample2 <- c(9, 4, 12)
mydist(sample1,sample2)
## Error in mydist(sample1, sample2): What are you doing??!! You cannot use vectors of unequal length! Gee!
# Does it warn users that in case of binary (presence-absence) data certain estimates cannot be computed?
sample1 <- c(0, 1, 1)
sample2 <- c(1, 1, 0)
mydist(sample1,sample2)
## Warning in mydist(sample1, sample2): binary data: missing values generated
## for measures that require abundance data
##                                             parameter
##  1. Shared taxa                             1.0000000
##  2. Present in sample 1 only                1.0000000
##  3. Present in sample 2 only                1.0000000
##  4. Shared absences                         0.0000000
##  5. Total number of species present         3.0000000
##  6. Total number of specimens                      NA
##  7. Total number of specimens in sample 1          NA
##  8. Total number of specimens in sample 2          NA
##  9. Total number of occurrences             4.0000000
## 10. total number of occurrences in sample 1 2.0000000
## 11. Total number of occurrences in sample 2 2.0000000
## 12. Simple Matching Coefficient             0.3333333
## 13. Jaccard Similarity                      0.3333333
## 14. Sorenson Similarity                     0.5000000
## 15. Jaccard Dissimilarity                   0.6666667
## 16. Sorenson Dissimilarity                  0.5000000
## 17. Forbes-Alroy Similarity                 0.7593088
## 18. Percentage Similarity                          NA
## 19. Bray Curtis Dissimilarity                      NA
## 20. Jaccard-Chao Similarity                        NA
## 21. Jaccard-Chao Similarity Adj                    NA
## 22. Sorenson-Chao Similarity                       NA
## 23. Sorenson-Chao Similarity Adj                   NA
## 24. Jaccard-Chao Dissimilarity                     NA
## 25. Jaccard-Chao Dissimilarity Adj                 NA
## 26. Sorenson-Chao Dissimilarity                    NA
## 27. Sorenson-Chao Dissimilarity Adj                NA

Obviously, if you are professional about your R functions, you would never write error messages that unnecessarily insult the users.

Now test it against well-established functions such as {vegdist}.

# most important now: does it compute parameters correctly?
sample1 <- c(0,15,3,42,0,0,1,12)
sample2 <- c(7,11,0,0,0,32,78,6)
mydist(sample1,sample2)
##                                               parameter
##  1. Shared taxa                               3.0000000
##  2. Present in sample 1 only                  2.0000000
##  3. Present in sample 2 only                  2.0000000
##  4. Shared absences                           1.0000000
##  5. Total number of species present           7.0000000
##  6. Total number of specimens               207.0000000
##  7. Total number of specimens in sample 1    73.0000000
##  8. Total number of specimens in sample 2   134.0000000
##  9. Total number of occurrences              10.0000000
## 10. total number of occurrences in sample 1   5.0000000
## 11. Total number of occurrences in sample 2   5.0000000
## 12. Simple Matching Coefficient               0.5000000
## 13. Jaccard Similarity                        0.4285714
## 14. Sorenson Similarity                       0.6000000
## 15. Jaccard Dissimilarity                     0.5714286
## 16. Sorenson Dissimilarity                    0.4000000
## 17. Forbes-Alroy Similarity                   0.8282635
## 18. Percentage Similarity                     0.1739130
## 19. Bray Curtis Dissimilarity                 0.8260870
## 20. Jaccard-Chao Similarity                   0.3313816
## 21. Jaccard-Chao Similarity Adj               0.3829736
## 22. Sorenson-Chao Similarity                  0.4978011
## 23. Sorenson-Chao Similarity Adj              0.5538408
## 24. Jaccard-Chao Dissimilarity                0.6686184
## 25. Jaccard-Chao Dissimilarity Adj            0.6170264
## 26. Sorenson-Chao Dissimilarity               0.5021989
## 27. Sorenson-Chao Dissimilarity Adj           0.4461592
vegdist(rbind(sample1,sample2), 'chao')
##           sample1
## sample2 0.6170264
vegdist(rbind(sample1,sample2), 'bray')
##          sample1
## sample2 0.826087
vegdist(rbind(sample1,sample2), 'jaccard', binary=T)
##           sample1
## sample2 0.5714286

Cite packages

citation('vegan')
## 
## To cite package 'vegan' in publications use:
## 
##   Jari Oksanen, F. Guillaume Blanchet, Michael Friendly, Roeland
##   Kindt, Pierre Legendre, Dan McGlinn, Peter R. Minchin, R. B.
##   O'Hara, Gavin L. Simpson, Peter Solymos, M. Henry H. Stevens,
##   Eduard Szoecs and Helene Wagner (2018). vegan: Community Ecology
##   Package. R package version 2.5-2.
##   https://CRAN.R-project.org/package=vegan
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {vegan: Community Ecology Package},
##     author = {Jari Oksanen and F. Guillaume Blanchet and Michael Friendly and Roeland Kindt and Pierre Legendre and Dan McGlinn and Peter R. Minchin and R. B. O'Hara and Gavin L. Simpson and Peter Solymos and M. Henry H. Stevens and Eduard Szoecs and Helene Wagner},
##     year = {2018},
##     note = {R package version 2.5-2},
##     url = {https://CRAN.R-project.org/package=vegan},
##   }
## 
## ATTENTION: This citation information has been auto-generated from
## the package DESCRIPTION file and may need manual editing, see
## 'help("citation")'.